73 research outputs found

    Towards the cloudification of the social networks analytics

    Get PDF
    In the last years, with the increase of the available data from social networks and the rise of big data technologies, social data has emerged as one of the most profitable market for companies to increase their benefits. Besides, social computation scientists see such data as a vast ocean of information to study modern human societies. Nowadays, enterprises and researchers are developing their own mining tools in house, or they are outsourcing their social media mining needs to specialised companies with its consequent economical cost. In this paper, we present the first cloud computing service to facilitate the deployment of social media analytics applications to allow data practitioners to use social mining tools as a service. The main advantage of this service is the possibility to run different queries at the same time and combine their results in real time. Additionally, we also introduce twearch, a prototype to develop twitter mining algorithms as services in the cloud.Peer ReviewedPostprint (author’s final draft

    Modeling projections in microaggregation

    Get PDF
    Microaggregation is a method used by statistical agencies to limit the disclosure of sensitive microdata. It has been proven that microaggregation is an NP-hard problem when more than one variable is microaggregated at the same time. To solve this problem in a heuristic way, a few methods based on projections have been introduced in the literature. The main drawback of such methods is that the projected axis is computed maximizing a statistical property (e.g., the global variance of the data), disregarding the fact that the aim of microaggregation is to keep the disclosure risk as low as possible for all records. In this paper we present some preliminary results on the application of aggregation functions for computing the projected axis. We show that, using the Sugeno integral to calculate the projected axis, we can reduce in some cases the disclosure risk of the protected data (when projected microaggregation is used).Postprint (author’s final draft

    Blocking anonymized data

    Get PDF
    Nowadays, privacy is an important issue, for this reason many researchers are working in the development of new data protection methods. The aim of these methods is to minimize the disclosure risk (DR) preserving the data utility. Due to this, the development of better methods to evaluate the DR is an increasing demand. A standard measure to evaluate disclosure risk is record linkage (RL). Normally, when data sets are very large, RL has to split the data sets into blocks to reduce its computational cost. Standard blocking methods need a non protected attribute to build the blocks and, for this reason, they are not a good option when the protected data set is completely masked. In this paper, we propose a new blocking method which does not need a blocking key to build the blocks, and therefore, it is suitable to split fully protected data sets. The method is based on aggregation operators. In particular, in the OWA operator.Peer ReviewedPostprint (author’s final draft

    Fuzzy measures and integrals in re-identification problems

    Get PDF
    In this paper we give an overview of our approach of using aggregation operators, and more specifically, fuzzy integrals for solving re-identification problems. We show that the use of Choquet integrals are suitable for some kind of problems.Postprint (author’s final draft

    Towards the use of OWA operators for record linkage

    Get PDF
    Record linkage is used to establish links between those records that while belonging to two different files correspond to the same individual. Classical approaches assume that the two files contain some common variables, that are the ones used to link the records. Recently, we introduced a new approach to link records among files when such common variables are not available. In this approach, reidentification is based on the so-called structural information. In this paper we study the use of OWA operators for extracting such structural information and, thus, allowing re-identification.Peer ReviewedPostprint (author’s final draft

    Revisiting distance-based record linkage for privacy-preserving release of statistical datasets

    Get PDF
    Statistical Disclosure Control (SDC, for short) studies the problem of privacy-preserving data publishing in cases where the data is expected to be used for statistical analysis. An original dataset T containing sensitive information is transformed into a sanitized version T' which is released to the public. Both utility and privacy aspects are very important in this setting. For utility, T' must allow data miners or statisticians to obtain similar results to those which would have been obtained from the original dataset T. For privacy, T' must significantly reduce the ability of an adversary to infer sensitive information on the data subjects in T. One of the main a-posteriori measures that the SDC community has considered up to now when analyzing the privacy offered by a given protection method is the Distance-Based Record Linkage (DBRL) risk measure. In this work, we argue that the classical DBRL risk measure is insufficient. For this reason, we introduce the novel Global Distance-Based Record Linkage (GDBRL) risk measure. We claim that this new measure must be evaluated alongside the classical DBRL measure in order to better assess the risk in publishing T' instead of T. After that, we describe how this new measure can be computed by the data owner and discuss the scalability of those computations. We conclude by extensive experimentation where we compare the risk assessments offered by our novel measure as well as by the classical one, using well-known SDC protection methods. Those experiments validate our hypothesis that the GDBRL risk measure issues, in many cases, higher risk assessments than the classical DBRL measure. In other words, relying solely on the classical DBRL measure for risk assessment might be misleading, as the true risk may be in fact higher. Hence, we strongly recommend that the SDC community considers the new GDBRL risk measure as an additional measure when analyzing the privacy offered by SDC protection algorithms.Postprint (author's final draft

    Dynamic reputation-based trust computation in private networks

    Get PDF
    Technical Report IIIA-TR-2009-02The use of collaborative networks services in general, and web based social networks (WBSN) services in particular, is today increasing and, therefore, the protection of the resources shared by network participants is becoming a crucial need. In a collaborative network, one of the main parameters on which access control relies is represented by trust and reputation, since access to a resource may or may not be granted on the basis of the trust/reputation of the requesting node. Therefore, the calculation of the trust of the nodes becomes a very important issue, mainly in business to business (BtoB) social networks, where trustworthy nodes can increase their benefits taking profit of their good reputation in the network. In order to address this point, in this paper we propose a mechanism to dynamically compute nodes trust, based on their past behavior. The key characteristic of our proposal is that trust is computed in a private way. This is obtained by anonymizing the local log files storing information about nodes actions.Preprin

    Towards the use of sequential patterns for detection and characterization of natural and agricultural areas

    Get PDF
    Nowadays, a huge amount of high resolution satellite images are freely available. Such images allow researchers in environmental sciences to study the different natural habitats and farming practices in a remote way. However, satellite images content strongly depends on the season of the acquisition. Due to the periodicity of natural and agricultural dynamics throughout seasons, sequential patterns arise as a new opportunity to model the behaviour of these environments. In this paper, we describe some preliminary results obtained with a new framework for studying spatiotemporal evolutions over natural and agricultural areas using k-partite graphs and sequential patterns extracted from segmented Landsat images.Postprint (author’s final draft

    Anonymizing data via polynomial regression

    Get PDF
    The amount of confidential information accessible through the Internet is growing continuously. In this scenario, the improvement of anonymizing methods becomes crucial to avoid revealing sensible information of individuals. Among several protection methods proposed, those based on the use of linear regressions are widely utilized. However, there is not a reason to assume that linear regression is better than using more complex polynomial regressions. In this paper, we present PoROP-k, a family of anonymizing methods able to protect a data set using polynomial regressions. We show that PoROP-k not only reduces the loss of information, but it also obtains a better level of protection compared to previous proposals based on linear regressions.Postprint (author’s final draft

    Increasing polynomial regression complexity for data anonymization

    Get PDF
    Pervasive computing and the increasing networking needs usually demand from publishing data without revealing sensible information. Among several data protection methods proposed in the literature, those based on linear regression are widely used for numerical data. However, no attempts have been made to study the effect of using more complex polynomial regression methods. In this paper, we present PoROP-k, a family of anonymizing methods able to protect a data set using polynomial regressions. We show that PoROP-k not only reduces the loss of information, but it also obtains a better level of protection compared to previous proposals based on linear regressions.Peer ReviewedPostprint (published version
    • …
    corecore